首页> 外文OA文献 >Generating Performance Portable Code using Rewrite Rules:From High-Level Functional Expressions to High-Performance OpenCL Code
【2h】

Generating Performance Portable Code using Rewrite Rules:From High-Level Functional Expressions to High-Performance OpenCL Code

机译:使用重写规则生成性能可移植代码:从高级功能表达式到高性能OpenCL代码

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Computers have become increasingly complex with the emergence of heterogeneous hardware combining multicore CPUs and GPUs. These parallel systems exhibit tremendous computational power at the cost of increased programming effort resulting in a tension between performance and code portability. Typically, code is either tuned in a low-level imperative language using hardware-specific optimizations to achieve maximum performance or is written in a high-level, possibly functional, language to achieve portability at the expense of performance.We propose a novel approach aiming to combine high-level programming, code portability, and high-performance. Starting from a high-level functional expression we apply a simple set of rewrite rules to transform it into a low-level functional representation, close to the OpenCL programming model, from which OpenCL code is generated. Our rewrite rules define a space of possible implementations which we automatically explore to generate hardware-specific OpenCL implementations. We formalize our system with a core dependently-typed-calculus along with a denotational semantics which we use to prove the correctness of the rewrite rules. We test our design in practice by implementing a compiler which generates high performance imperative OpenCL code. Our experiments show that we can automatically derive hardware-specific implementations from simple functional high-level algorithmic expressions offering performance on a par with highly tuned code for multicore CPUs and GPUs written by experts.
机译:结合了多核CPU和GPU的异构硬件的出现,计算机变得越来越复杂。这些并行系统以增加编程工作为代价展现出巨大的计算能力,从而导致性能和代码可移植性之间的紧张关系。通常情况下,代码要么使用特定于硬件的优化以低级命令式语言进行调整以实现最高性能,要么以高级(可能是功能性的)语言编写以牺牲性能而实现可移植性。结合了高级编程,代码可移植性和高性能。从高级功能表达式开始,我们应用了一组简单的重写规则,将其转换为接近OpenCL编程模型的低级功能表示形式,并由此生成OpenCL代码。我们的重写规则定义了可能的实现空间,我们将自动探索这些空间以生成特定于硬件的OpenCL实现。我们使用核心的依赖类型演算以及定义语义对我们的系统进行形式化,以证明重写规则的正确性。我们在实践中通过实现可生成高性能命令性OpenCL代码的编译器来测试我们的设计。我们的实验表明,我们可以从简单的功能高级算法表达式中自动导出特定于硬件的实现,这些表达式可提供与专家编写的针对多核CPU和GPU的高度优化代码相当的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号